forked from the tutorial at EuroScipy 2015 by Joris Van den Bossche (Ghent University, Belgium)
Licensed under CC BY 4.0 Creative Commons
If you want to follow along, this is a notebook that you can view or run yourself:
pandas >= 0.15.2 (easy solution is using Anaconda)Some imports:
In [7]:
    
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.options.display.max_rows = 8
    
In [2]:
    
data = pd.read_csv('data/airbase_data.csv', index_col=0, parse_dates=True, na_values='-9999')
    
In [3]:
    
data
    
    Out[3]:
to answering questions about this data in a few lines of code:
Does the air pollution show a decreasing trend over the years?
In [4]:
    
data['1999':].resample('A').plot(ylim=[0,100])
    
    Out[4]:
    
How many exceedances of the limit values?
In [5]:
    
exceedances = data > 200
exceedances = exceedances.groupby(exceedances.index.year).sum()
ax = exceedances.loc[2005:].plot(kind='bar')
ax.axhline(18, color='k', linestyle='--')
    
    Out[5]:
    
What is the difference in diurnal profile between weekdays and weekend?
In [6]:
    
data['weekday'] = data.index.weekday
data['weekend'] = data['weekday'].isin([5, 6])
data_weekend = data.groupby(['weekend', data.index.hour])['FR04012'].mean().unstack(level=0)
data_weekend.plot()
    
    Out[6]:
    
We will come back to these example, and build them up step by step.
For data-intensive work in Python the Pandas library has become essential.
What is pandas?
R's data.frame in Python.It's documentation: http://pandas.pydata.org/pandas-docs/stable/
.dropna(), pd.isnull())concat, join)groupby functionalitystack, pivot)